Mobile DNA
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Mobile DNA's content profile, based on 27 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Bousios, A.; Primetis, E.
Show abstract
MotivationThe ATHILA lineage of LTR retrotransposons has colonised all branches of the plant tree of life. In Arabidopsis thaliana and A. lyrata, ATHILA elements have invaded centromeres, influencing the genetic and epigenetic organisation, and driving satellite evolution. To assess the broader significance of ATHILA across plants, a computational pipeline is needed to identify ATHILA elements with high efficiency. Existing tools lack this ability because they are optimised for broad transposon classification at the expense of precise annotation of lower taxonomic levels. ResultsWe present ATHILAfinder, a pipeline for accurate and large-scale discovery of ATHILA elements. ATHILAfinder uses lineage-specific sequence motifs as seeds and additional filters to build de novo intact elements. Homology-based steps rescue intact ATHILA and identify soloLTRs. A detailed identity card includes coordinates, LTR identity, coding capacity, length and other sequence features for every ATHILA. We validate ATHILAfinder in the A. thaliana Col-CEN assembly and five additional Brassicaceae species, covering four supertribes and [~]30 million years of evolution. ATHILAfinder has very low false positive rates and outperforms widely-used tools like EDTA and the deep-learning-based Inpactor2 software for both recovery and precision of ATHILA. To demonstrate its usefulness, we generate insights into ATHILA dynamics across Brassicaceae. OutlookFew computational pipelines target specific transposon lineages, yet such tools can empower their identification and downstream analyses. Our tailored approach can be adapted to other LTR retrotransposon lineages, offering new ways for high-resolution analysis of transposons.
May, G. E.; Akirtava, C.; McManus, J.
Show abstract
Since the discovery of viral Internal Ribosome Entry Sites (IRESes), researchers have sought to find similar elements in mammalian host genes, termed "cellular IRESes". However, the plasmid systems used to measure cellular IRES activity are vulnerable to false positives due to promoter activity in candidate IRESes. Orthogonal methods are needed to validate putative IRESes while carefully avoiding artifacts known to cause false positives. Recently, Koch et al. proposed approaches for studying IRESes, primarily circular RNA-generating plasmids, and for validating mRNA transcripts using smFISH and qRT-PCR. Here, we demonstrate confounding variables and artifacts in each of these approaches that can lead to inappropriate conclusions about potential cellular IRES activity. We show the back-splicing circRNA plasmid creates linear mRNA artifacts associated with false-positive IRES signals. Using orthogonal, gold-standard assays validated with viral IRESes, we find putative cellular IRESes reported using the back-splicing plasmid have no IRES activity. Furthermore, we demonstrate that smFISH and qRT-PCR can misidentify nuclear non-coding RNAs as mRNAs and we validate a single molecule sequencing assay for identifying genuine mRNA 5 ends. Our work establishes reliable methods for robust transcript annotation and IRES studies that avoid documented artifacts arising from bicistronic and back-splicing circRNA plasmid reporters.
Jorgensen, T. E.; Wardale, A.; Wolf Profant, S.; Amundsen, C.; Emblem, A.; Joakimsen, I. S.; Brekke, O.-L.; Karlsen, B. O.; Babiak, I.; Johansen, S. D.
Show abstract
Even though teleost fish and mammals share the same mitochondrial gene content and organization, the teleost mitochondrial transcriptome is still poorly understood. We characterized the mitochondrial transcriptome during zebrafish (Danio rerio) early development by long-read direct RNA sequencing. All heavy-strand specific mRNAs were found to carry 3 poly-A tails of approximately 50-60 residues, and the transcriptome profile was distinctive but practically invariant between stages. Three unusual transcripts were however noted. These included two mRNAs (COI and ND5 mRNAs), with significant 3 untranslated regions corresponding to antisense gene sequences, and a previously not described noncoding RNA named here lncOriL. The ND5 mRNA was found to carry one third of all detected m6A methylation sites in the zebrafish mitochondrial transcriptome. The 313 nt-long lncOriL transcript had an abundance comparable to that of ND5 mRNA and it mapped to mitochondrial genome region covering the origin of light strand replication and four flanking antisense tRNAs. A mitochondrial tRNA-derived fragment (tiRNA5-Asn), with a 35 nt perfect pairing-potential to lncOriL, was present at all stages. Additional analyses including adult zebrafish, scissortail (Rasbora rasbora), and monkfish (Lophius piscatorius) strongly corroborate the results of COI mRNA, ND5 mRNA, and lncOriL transcript prevalence among teleost fish. Surprisingly, our findings in zebrafish were further supported by mitochondrial transcriptome analyses in domestic pig (Sus scrofa) and human (Homo sapiens), including tiRNA5-Asn commonly present in human tissues, suggesting that lncOriL is ubiquitously expressed and regulated in vertebrates. Author SummaryMitochondria contain their own genome and produce essential RNAs needed for energy production. Although fish and mammals share the same mitochondrial gene organization, less is known about how mitochondrial RNAs are processed and regulated in teleost. Using Nanopore direct RNA sequencing, we examined mitochondrial RNAs during early zebrafish development and discovered three unusual transcripts that include extended non-coding regions. Two of these molecules, COI and ND5 mRNAs, carry long 3' untranslated regions formed by antisense gene sequences, suggesting previously unrecognized regulatory potential. We also identified lncOriL, a highly structured long noncoding RNA that spans the origin of light-strand replication and is abundant during development. Strikingly, the same RNA feature, including lncOriL and a matching tRNA-derived small RNA (tiRNA5-Asn), was found not only in zebrafish but also in human mitochondrial transcriptomes. These findings support conservation of regulatory mitochondrial RNAs across main groups of vertebrate species. Our work reveals a new layer of mitochondrial RNA regulation and expands the current understanding of how mitochondrial gene expression is controlled.
Sattler, M. C.; Singh, A.; Bass, H. W.; Mondin, M.
Show abstract
BackgroundMaize knobs are regions of constitutive heterochromatin that are readily identified in both meiotic and somatic chromosomes. These structures have been characterized as stable throughout the cell cycle, exhibiting late replication during the S-phase, and are composed of two specific families of highly repetitive DNA sequences: K180 and TR-1. Although widely used as cytogenetic markers due to their variability in number and chromosomal position across inbred lines, hybrids, and landraces, little is known about their chromatin structure and dynamics. In this study, we analyzed chromatin accessibility of knobs using DNS-seq data across four maize tissues representing distinct developmental stages. ResultsOur results reveal that K180 knobs exhibit tissue-specific variation in chromatin accessibility, transitioning between open and closed states during development. In contrast, the TR-1 knob of chromosome 4 remained consistently inaccessible across all tissues analyzed. A knob composed of both K180, and TR-1 further supported this observation, with only the K180 region showing dynamic accessibility. To validate these findings, we also analyzed other repetitive regions such as centromeres, which showed a uniformly closed chromatin structure similar to TR-1. These results suggest a unique developmental modulation of chromatin accessibility associated with K180 repeats. While the chromatin accessibility of knobs does not reach the levels observed at Transcription Start Sites (TSS), the comparison among different classes of repetitive DNA within maize constitutive heterochromatin provides compelling evidence for sequence-specific and tissue-specific chromatin dynamics. ConclusionsOur findings uncover a previously unrecognized property of maize knobs and establish a reference for future studies on chromatin organization and epigenetic regulation of repetitive DNA in plant genomes.
Forcier, T.; Cheng, E.; Tam, O. H.; Wunderlich, C.; Castilla-Vallmanya, L.; Jones, J. L.; Quaegebeur, A.; Barker, R. A.; Jakobsson, J.; Gale Hammell, M.
Show abstract
Transposable elements (TEs) are mobile genetic sequences that can generate new copies of themselves via insertional mutations. These viral-like sequences comprise nearly half the human genome and are present in most genome wide sequencing assays. While only a small fraction of genomic TEs have retained their ability to transpose, TE sequences are often transcribed from their own promoters or as part of larger gene transcripts. Accurately assessing TE expression from each individual genomic TE locus remains an open problem in the field, due to the highly repetitive nature of these multi-copy sequences. These issues are compounded in single-cell and single-nucleus transcriptome experiments, where additional complications arise due to sparse read coverage and unprocessed mRNA introns. Here we present our tool for single-cell TE and gene expression analysis, TEsingle. Using synthetic datasets, we show the problems that arise when not properly accounting for intron retention events, failing to address uncertainty in alignment scoring, and failing to make use of unique molecular identifiers for transcript resolution. Addressing these challenges has enabled an accurate TE analysis suite that simultaneously tracks gene expression as well as locus-specific resolution of expressed TEs. We showcase the performance of TEsingle using single-nucleus profiles from substantia nigra (SN) tissues of Parkinsons Disease (PD) patients. We find examples of young and intact TEs that mark dopaminergic neurons (DA) as well as many young TEs from the LINE and ERV families that are elevated in PD neurons and glia. These results demonstrate that TE expression is highly cell-type and cellular-state specific and elevated in particular subsets of neurons, astrocytes, and microglia from PD patients.
Domingues-Silva, B.; Azzalin, C. M.
Show abstract
Mammalian telomeric DNA comprises long tracts of tandem TTAGGG repeats. The same repeats are also found at internal chromosomal regions called interstitial telomeric sequences (ITSs). Telomeres are transcribed into UUAGGG-containing transcripts, named TERRA, which serve multiple functions in maintaining telomere integrity. Complementary RNAs containing C-rich telomeric repeats, named ARIA, have also been identified in few yeast mutants and mammalian cells with dysfunctional telomeres. The molecular features and functions of ARIA remain understudied, mainly due to its low abundance and the lack of suitable cellular systems. Here, we show that Chinese hamster ovary (CHO) cells produce abundant TERRA and ARIA transcripts, predominantly originating from ITSs. Both RNAs are polyadenylated, exhibit relatively short half-lives and form large cellular foci. We also show that ARIA depletion leads to exposure of single-stranded (ss) DNA at ITSs and that ssDNA exposure increases when ITS DNA is damaged. SsDNA formation does not require the DNA damage signaling kinases ATM and ATR, nor the exonucleases DNA2 and EXO1; however, ATM prevents excessive ssDNA accumulation when ARIA function is inhibited. These findings establish CHO cells as a powerful model to dissect telomeric RNA functions and reveal ARIA as a key regulator of telomeric repeat DNA integrity.
Gorbenko, I. V.; Scherbakov, D. Y.; Zverintseva, K. M.; Konstantinov, Y. M.
Show abstract
Short Interrupted Repeats Cassettes (SIRC) are recently discovered eukaryotic DNA elements possessing many traits of satellite DNA and mobile genetic elements, and consisted of short direct repeats interspersed with diverse spacer sequences. The SIRC ensemble of individual species is highly heterogenous and cannot be studied using alignment methods. It was found that number of similar SIRC sequences in a given pair of species is in general correlated with their taxonomic distance, and, at the same time, closely related species can possess very diverged SIRC ensembles, which makes SIRC evolutionary pattern closer to mobile genetic element type. The SIRC sequences make up clusters with comparable sequence patterns, that are likely to demonstrate doublet evolutionary model which strongly supports that the SIRC structure is supported by the evolutionary selection. Several SIRC sequences of Arabidopsis were found to be of ancient origin with traceable evolution history as far as to the moss clade. We carried out unbiased detection of SIRC ensembles in 10 plant genomes and found that, despite very high intraspecies heterogeneity, SIRC sets possess strong interspecies phylogenetic signal. Key messageShort Interrupted Repeats Cassettes are elements of ancient origin, and could potentially be used to trace organism history, and to facilitate syntheny and Hi-C analysis.
Pacht, E.; Warren, J.; Toor, R.; Glass, K. C.; Greenyer, H.; Fritz, A.; Banerjee, B.; Frietze, S. C.; Lian, J.; Gordon, J.; Stein, G.; Stein, J.
Show abstract
Long noncoding RNAs (lncRNAs) are important regulators of gene expression and are frequently dysregulated in cancer. The mitotically associated lncRNA MANCR is highly expressed in aggressive cancers and contributes to genomic instability in triple-negative breast cancer (TNBC), but the molecular mechanisms underlying its activity remain poorly defined. Here we integrate computational and experimental approaches to examine the structure and regulatory interactions of MANCR isoforms. Analysis of transcriptomic datasets revealed tumor-type-specific expression patterns for seven MANCR isoforms in breast cancer cell lines. Computational prediction of RNA secondary structures identified conserved structural features across isoforms, suggesting potential functional specialization. We identify p53 as a MANCR-interacting protein through computational docking and RNA immunoprecipitation sequencing (RIP-seq) and demonstrate that MANCR depletion reduces p53-dependent transcriptional activity. Chromatin isolation by RNA purification sequencing (ChIRP-seq) revealed 1, 250 genomic regions associated with MANCR, including enrichment of p53 consensus motifs and GC-rich sequence elements. Motif analysis further identified candidate sequence features associated with MANCR-occupied chromatin regions. Computational prediction of RNA-miRNA interactions identified multiple potential miRNA binding sites across MANCR isoforms, including miR-6756-5p, which targets the androgen receptor (AR). Consistent with this prediction, AR expression decreased following MANCR knockdown in TNBC cells. Together, these results suggest that MANCR isoforms may contribute to transcriptional regulation in TNBC through interactions with chromatin, p53 signaling pathways, and potential miRNA regulatory networks. One Sentence SummaryMitotically-associated lncRNA (MANCR) is prevalent in aggressive cancers interacting with DNA, P53, and miRNAs, to mediate multiple levels of epigenetic transcriptional control in triple negative breast cancer.
Carvajal-Garcia, J.; Merrikh, H.
Show abstract
DNA replication and transcription occur simultaneously on the same template, leading to conflicts between the two machineries. Conflicts stall replication forks, lead to genome instability, mutagenesis, and the formation of deleterious R-loop structures. There are two types of conflicts: depending on the strand where a gene is encoded, the two machineries either meet head-on or co-directionally. The adverse outcomes of conflicts in the head-on orientation are significantly more detrimental to cells compared to co-directional conflicts. Despite many studies across various organisms, how the replication fork structure is impacted by these encounters remains unclear. Here, we performed an unbiased genetic screen using a transposon library to identify factors essential for surviving head-on conflicts in Bacillus subtilis. Our screen identified three hits: RNase HIII, AddA and AddB. Our prior work had shown that RNase HIII, which processes R-loops, is essential for surviving head-on conflicts. However, AddA and AddB, which function together as a complex to process blunt DNA ends, have not been previously identified as essential conflict resolution factors. Through follow-up genetic and biochemical analyses, we found that the helicase activity but not the nuclease activity of this complex is required for conflict resolution. Based on the fundamental properties of DNA at replication fork structures, our work collectively indicates that upon head-on conflicts, the nascent strands form a reversed fork structure, which is then unwound by AddAB, which leads to re-annealing to the parental strands. This process re-establishes an intact replication fork that can be used to restart replication after conflicts with transcription.
Michalek, K.; Bhattacharjee, S.; Movasati, A.; Clerc, V.; Andres, J.; Hotz, A.; Metzner, K. J.
Show abstract
Latent HIV-1 proviruses remain the major barrier to curing HIV infection. Although many of these proviruses are defective, with large internal deletions and hypermutations, the mechanisms underlying their formation are still poorly understood. In this study, we applied CRISPR/Cas9 knockout screens to identify DNA damage response (DDR) proteins that contribute to the formation of defective HIV-1 proviruses carrying large internal deletions. Using an HIV-1-based dual-fluorophore vector as a model, we distinguished cells harbouring intact proviruses from those carrying large internal deletions by flow cytometry and cell sorting. We then validated top candidates using CRISPR-mediated gene activation and small interfering RNA-mediated knockdown, and we measured gene and protein expression by quantitative PCR and Western blotting. Across these approaches, the helicase-like transcription factor HLTF emerged as a consistent modulator of large internal deletions: increased HLTF expression raised the proportion of cells carrying defective proviruses, whereas reduced HLTF expression had the opposite effect. Additional repair factors, including RAD1, RAD18, TREX2, and ZRANB3, also influenced the balance between intact and defective proviruses, suggesting that multiple DNA repair pathways cooperate in this process. Deep sequencing of reporter proviruses confirmed the presence of large internal deletions in the populations identified as defective. Our data indicate that several DNA damage response proteins, including HLTF, are involved in the generation of defective proviruses and may constitute a previously undescribed host defense mechanism against HIV-1. Authors SummaryWhen HIV-1 infects a cell, it copies its genetic material (RNA) into DNA and inserts this DNA into the cells genome, giving rise to proviruses that can persist for long periods and become part of the host DNA. Many of these viral DNA copies are defective, often missing large parts of their genome, but we still do not fully understand how these large deletions arise. In this study, we used a genetic screening approach to switch off many human DNA repair genes and asked how this affected the balance between intact and defective HIV proviral DNA. We used an HIV-1-based dual-colour reporter vector allowing us to distinguish intact from deleted viral DNA by simple fluorescence read-outs. We found that several human DNA repair factors, in particular a protein called HLTF, change how often large deletions appear. Our results suggest that normal DNA repair processes in infected cells can sometimes turn incoming HIV-1 DNA into defective forms that cannot support productive infection. This work points to host DNA repair as a contributor to the large pool of defective HIV-1 DNA seen in people with HIV (PWH) and raises the possibility that these pathways could one day be harnessed to make infections less harmful.
Orozco-Arias, S.; Ferrer-Pomer, I.; Rodrigues de Goes, F.; Gaviria-Orrego, S.; Gomiz-Fernandez, J.; Llatser-Torres, J.; Paschoal, A. R.; Guyot, r.; Gabaldon, T.
Show abstract
Transposable elements (TEs) are major drivers of genome evolution, yet their annotation and classification remain inconsistent and hard to reproduce across species. Fragmented repeats, lineage-specific innovations, and heterogeneous taxonomies across databases and tools complicate comparisons and slow progress in TE biology. To address this, we developed PanTEon, a cross-kingdom deep learning framework for reproducible TE classification that combines a harmonized database with an open, modular benchmarking platform. The PanTEon Database is an automatically curated, taxonomically broad TE repository spanning animals, plants, and fungi. The PanTEon platform standardizes training, evaluation, and inference across nine Machine Learning methods, while remaining extensible to user-defined architectures. Using this framework, we benchmark state-of-the-art Machine Learning-based TE classifiers across TE superfamilies and major eukaryotic lineages and find that performance varies markedly by kingdom and superfamily. Ensemble approaches and phylum-specific models improve predictive F1 scores, but cross-species generalization remains a major challenge. Together, PanTEon Database and PanTEon platform provide a reproducible, scalable, and extensible foundation for TE classification, enabling standardized evaluation of future AI methods and supporting community-driven annotation efforts.
Iki, T.; Kai, T.; Isshiki, W.; Kozuka-Hata, H.; Oyama, M.
Show abstract
Silencing complexes formed by PIWI-clade Argonaute (Ago) proteins and PIWI-interacting RNAs (piRNAs) are essential guardians of genome integrity, controlling the deleterious activities of transposable elements (TEs) in animal germline. However, our understanding of PIWI-piRNA-directed TE silencing remains incomplete. Here, we systemically characterize the proximity proteome of PIWI members, Piwi, Aubergine (Aub), and Ago3 in the germline of Drosophila ovaries. Functional screening identifies previously uncharacterized factors involved in TE silencing, including H3K4me3 writer and transcriptional coactivator Set1. Transcriptome analysis reveals that Set1 acts as an indispensable repressor for TEs, particularly those forming telomeres. The involvement of Set1 in Piwi pathway is further supported by its critical role in the production of antisense, TE-targeting piRNAs. Notably, catalytic activity of Set1 is dispensable for TE silencing. Genome-wide chromatin binding analysis using CUT&Tag demonstrates that Set1 preferentially associates with TE sequences and localizes to subtelomeric piRNA cluster loci, suggesting a role in promoting piRNA precursor transcription through direct binding. Collectively, these findings uncover a noncanonical function of Set1 in Piwi-mediated TE silencing and telomere control in germline nuclei.
Trummer, N.; Weyrich, M.; Ryan, P.; Furth, P. A.; Hoffmann, M.; List, M.
Show abstract
Anti-hormonal therapies such as selective estrogen receptor modulators like tamoxifen or aromatase inhibitors like letrozole represent a cornerstone for breast cancer prevention and therapy of estrogen receptor-positive breast cancer. Therapeutic monitoring can include blood tests and imaging; however, genetically-based approaches are not yet in practice. Ideally, a test would be able to detect a positive molecular response across different estrogen pathway-suppressive approaches. Circular RNAs are a species of non-coding RNAs detectable in plasma that have been proposed as non-invasive therapeutic biomarkers. To determine whether a set of specific circular RNAs is altered across estrogen-suppressive pathway approaches, we analyzed mammary gland-specific total RNA sequencing data from two individual genetically engineered mouse models (GEMMs) of estrogen pathway-induced breast cancer, with or without exposure to tamoxifen or letrozole. The nf-core/circrna pipeline was used to identify circRNAs that were differentially expressed in response to either tamoxifen or letrozole. We then screened for circRNAs that were differentially regulated by both anti-hormonals. Four up-regulated and 31 down-regulated circRNAs with host genes known to be expressed in human breast epithelial cells were identified as showing reproducible differential regulation in response to anti-hormonal treatment.
Montoliu-Nerin, M.; Strunov, A.; Heyworth, E.; Schneider, D. I.; Thoma, J.; Hua-Van, A.; Courret, C.; Klasson, L. J.; Miller, W. J.
Show abstract
BackgroundAlthough strict maternal transmission of mitochondria is a general feature of animals and humans for ensuring homogeneity in mitochondrial DNA (mtDNA) across generations, exceptions were reported in the recent past. For example, some extremely rare but spectacular cases of heteroplasmy and paternal transmission in humans have questioned the universal evolutionary principle. Hence, as an alternative, the Mega-NUMT concept was coined to explain this discovery and was thereafter partly proven to exist. This concept expands on the quite common transfer of mtDNA fragments to the nucleus (NUMTs) by considering the existence of multicopy mitochondrial nuclear insertions. Mega-NUMT reports are currently restricted to a few cases in animals, including humans. However, even in humans, their detailed genomic organization, natural prevalence, and potential biological functions remain unclear. Methodology/Principal FindingsHere, we discovered that up to 60 full-sized mitochondrial genomes are integrated into the nuclear genome of the neotropical fruit fly Drosophila paulistorum using long-read sequencing and confirmed their presence by in situ hybridization. The copies are organized in one cluster on chromosome 3, which we, due to its similarity with the Mega-NUMT concept, designated the "Dpau Mega-NUMT". Contrary to the rarity in humans, this Mega-NUMT is found at high prevalence (40%) in both long-term laboratory lines and natural D. paulistorum populations of different semispecies. Additionally, the mitochondrial copies in the Mega-NUMT cluster are phylogenetically separated from the current mitotypes of D. paulistorum. Together, these observations suggest long-term maintenance of the Mega-NUMT in nature. Hence, we propose that the Dpau Mega-NUMT may have been transferred to the nuclear genome before D. paulistorum semispecies radiation and maintained at relatively high prevalence in nature by balancing selection due to yet undetermined functions. Conclusions/SignificanceTo our knowledge, this is the first verified existence and detailed dissection of a Mega-NUMT outside cats and humans. We show that Mega-NUMTs can be persistent in nature, even at high prevalence, potentially due to balancing selection. Our findings strengthen the importance of high-quality long-read sequencing technologies for deciphering complex repeat-rich genomic regions to deepen our understanding of the dynamics of genome evolution within genomic "dark matter".
Shen, J.; Tang, S.; Xia, Y.; Qin, J.; Xu, H.; Tan, Z.
Show abstract
BackgroundConventional models of human ribosomal DNA (rDNA) array organization have historically depended on transcription-centric boundaries, partitioning the unit into a [~]13 kb rDNA transcription region and a monolithic [~]31 kb intergenic spacer (IGS). While our previous identification of Duplication Segment Units (DSUs) mapped these arrays based on an intuitive analysis of the microsatellite density landscape of the complete reference human genome, our present deep mining of this landscape has revealed a more accurate rDNA Gene Unit Pattern. Methods & ResultsIn this study, we conducted a deep mining analysis of our previously established microsatellite density landscape of the T2T-CHM13 assembly, focusing specifically on nucleolar organizing regions (NORs). We suggest a more accurate rDNA Gene Unit Pattern containing a (CTTT)n microsatellite aggregation ahead of the rDNA gene and a (CT)n microsatellite aggregation behind the gene, rather than a pattern featuring an IGS region inserted between two rDNA genes. ConclusionsA correct rDNA gene pattern of the human genome probably includes a (CTTT)n microsatellite aggregation ahead of the gene and a (CT)n microsatellite aggregation behind it, which possibly constitute cis- and trans-regulating regions; the (CTTT)n and (CT)n microsatellite aggregations may provide two different local stable DNA structures for regulatory protein binding.
He, J.; Peng, C.; Zhang, Y.; Wang, Z.; Zhang, H.; Fang, L.; Zhao, P.
Show abstract
Transposable elements (TEs) are pivotal drivers of eukaryotic genome evolution and phenotypic diversity. However, their functional contributions to complex traits remain largely obscured by expression quantification challenges arising from high sequence homology and multi-mapping ambiguities. Here, we present LATTE, an efficient computational framework for defining and quantifying TE expression at locus-specific resolution by leveraging an innovative multi-indicator Expectation-Maximization (EM) algorithm. Extensive benchmarking against simulated datasets demonstrated that LATTE significantly outperformed existing state-of-the-art tools, achieving an accuracy of 0.998 at the subfamily level and 0.839 at the locus-specific level. Applying LATTE to 813 RNA-seq datasets across humans, cattle, and chickens, we quantified expression profiles of 2,703 TEs, followed by TE-expression quantitative trait loci (TE-eQTL) mapping. The colocalization rates between TE-eQTL and host gene-eQTL was low, revealing a distinct regulatory landscape of TE expression. This decoupled correlation between TEs and host genes are likely mediated by the differential expression of alternative transcripts. Through integrated TE-eQTL and genome-wide association studies on 3,746 complex traits across three species, we demonstrated that TEs constitute 204 (8.7%) additional associations with complex traits beyond gene-eQTL. More specifically, the Sjogrens syndrome-associated variant rs10954213 acts as a TE-eQTL that shifts the splicing landscape of IRF5, upregulating TE-containing transcripts while simultaneously suppressing canonical ones. Collectively, LATTE provides an efficient framework for studying TE expression across species, and our findings highlight the key role of TEs in understanding the genetic architecture of complex phenotypes.
Simmons, J. R.; Xue, T.; McCord, R. P.; Wang, J.
Show abstract
Programmed DNA elimination (PDE) is a notable exception to genome integrity, characterized by significant DNA loss during development. In many nematodes, PDE is initiated by DNA double-strand breaks (DSBs), which lead to chromosome fragmentation and subsequent DNA loss. However, the mechanism of nematode programmed DNA breakage remains largely unclear. Interestingly, in the human and pig parasitic nematode Ascaris, no conserved motif or sequence structures are present at chromosomal breakage regions (CBRs), suggesting the recognition of CBRs may be sequence-independent. Using Hi-C, we revealed that Ascaris CBRs engage in three-dimensional (3D) interactions before PDE, indicating that physical contacts between break regions may contribute to the PDE process. The 3D interactions are established in both Ascaris male and female germlines, demonstrating inherent genome organization associated with the CBRs and to-be-eliminated sequences. In contrast, in the unichromosomal horse parasite Parascaris univalens, transient pairwise interactions between neighboring CBRs that will form the ends of future somatic chromosomes were observed only during PDE. Intriguingly, we found that Ascaris PDE, which converts 24 germline chromosomes into 36 somatic ones, induces specific compartmentalization changes. Remarkably, Parascaris PDE generates the same set of 36 somatic chromosomes, and the 3D compartment changes following PDE are consistent between the two species. Overall, our findings suggest that CBRs spatially demarcate the retained and eliminated DNA and may contribute to their spatial organization during Ascaris PDE. We also demonstrated that the 3D genome reorganization of the somatic chromosomes in these nematodes following PDE is evolutionary and developmentally conserved.
Knudson, L. A.; Kosti, A.; Moss, K. R.; Shi, L.; Nguyen, G. N.; Janusz-Kaminska, A.; Zhou, E. X.; Hildebrandt, R. P.; Wang, E. T.; Bassell, G. J.
Show abstract
Muscleblind-like (MBNL) RNA-binding proteins (RBPs) possess modular domains that mediate regulation of alternative splicing and RNA localization. Myotonic Dystrophy Type 1 is a CTG repeat expansion disorder where MBNL is sequestered into intranuclear RNA foci, impairing its function. Previous studies found that MBNL self-associates through its exon 7, but the nature of this interaction is not well understood. We identified a cysteine in MBNL1 exon 7 that enables dimerization through formation of an intermolecular disulfide bond. We likewise demonstrate that MBNL2 dimerizes by forming disulfide bonds between multiple cysteines in its carboxy-terminus. Nucleocytoplasmic fractionation revealed a greater proportion of MBNL1 dimer in the nucleus, suggesting a nuclear function for the MBNL1 dimer. We investigated a connection between MBNL1 dimerization and MBNL1-mediated regulation of alternative splicing. To accomplish this, we mutated the MBNL1 cysteine in question to alanine (C325A) and performed RNAseq. We uncovered novel splicing events sensitive to MBNL1 dimerization. We also found that MBNL1 C325A, when co-expressed with expanded CTG repeats, produces smaller, more numerous foci, suggesting a role for the MBNL1 dimer in maintaining foci integrity. These results provide insight into biological and pathological mechanisms of MBNL1 dimerization and suggest other RBPs might similarly dimerize to regulate function. GRAPHICAL ABSTRACT
Ahn, J.; Zack, D.; Zhang, P.
Show abstract
Accurate detection of RNA splice variants is often hindered when transcripts lack large distinguishable exonic regions, making conventional PCR strategies challenging. We developed a simple melting temperature (Tm)-guided exon-exon junction (EEJ) RT-PCR method to enable variant-specific detection under these conditions. Uni-directional primers spanning exon-exon junctions were designed so that approximately each half anneals to adjacent exons. The Tm of each half-site was set >7{degrees}C below the annealing temperature, preventing stable binding to individual exons and enforcing junction-dependent amplification. The method was evaluated using HTRA1-AS1 long noncoding RNA variants that share overlapping exon sequences but differ in splice connectivity. HTRA1-AS1 comprises five variants, only one with a large distinguishable exon. Tm-guided EEJ primers robustly discriminated the remaining four variants. After optimization, amplification yielded sharp, single bands with minimal cross-reactivity. Compared with conventional designs, this approach reduced heteroduplex and heteroquadruplex formation, improving band clarity. Sanger sequencing confirmed junction specificity, and the method performed well in multiplex settings. Overall, Tm-guided EEJ RT-PCR is a cost-effective, high-resolution approach for detecting RNA variants lacking easily distinguishable exonic regions, readily compatible with standard RT-PCR and qPCR workflows.
Pujal, D.; Ylla, G.; Bau, J.; Piulachs, M.-D.
Show abstract
The cockroach Blattella germanica possesses panoistic ovaries, in which oocytes lack nurse cells and therefore need to rely on their own transcriptional activity to support embryogenesis. Ovarian development in this species involves the development of a single basal ovarian follicle (BOF) per gonadotropic cycle, a process strictly regulated by endocrine signals, primarily juvenile hormone and ecdysone, which act at both the transcriptional and translational levels. In addition, transcriptional activity in these ovaries is necessary for both regulating and genome protection, and at this level, PIWI-interacting RNAs (piRNAs) play an essential role. Although insect ovaries are known to be particularly rich in piRNAs, their function in ovary maturation is still not well defined. For this purpose, we characterize the piRNA expression dynamics across seven key developmental and reproductive stages, ranging from late nymphal instars to post-vitellogenic adults. piRNA expression in B. germanica shows coordinated fluctuations. Expression remains stable in previtellogenic ovaries, whereas vitellogenic ovaries show pronounced changes. Moreover, vitellogenic ovaries exhibit reduced piRNA diversity due to strong enrichment of a subset of highly expressed piRNAs. Our data show that although piRNAs predominantly map to transposable elements, particularly LINEs, there is a notable increase in gene-derived piRNAs toward the end of the cycle. Our results suggest regulatory roles of piRNAs in modulating both TEs and mRNAs during BOF maturation, likely related to changes in the follicular cell program.